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DIRECTOR OF CENTRAL INTELLIGENCE 
Human Resources Committee 


OFFICE OF THE CHAIRMAN . HRC-81-72 
6 August 1981 
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NEMORAKDUM FOR: Members of the Ruman Resources Committee 


25X1 FROM: ° 


Chairman, Human Resources Conmittee 


1. We now have sufficient guidance from the DCI and have 
had several exchanges with the DDCI which together enable us to 
formulate a coherent approach to the Comnunity's human source 
collection activities and problems. The principal elements of 
this approach, which will in the future govern my own activities 
as Chairman of the Human Resources Committee and that of my 
office as both the HRC staff and the HUMINT component of the 
Intelligence Community Staff, are the following: 


Ae We will continue to provide support to the varied 
human source collection activities of the Community members. 
In this connection, I again urge all members of the HRC to 
bring to my attention problems and recommendations for 
solutions which are not susceptible to in-house solutions, 
which cut across agency lines or which require cooperation 
and support of non-intelligence departments or agencies and 
in which action by the Committee or by myself personally 
may be helpful. As necessary, we will form subgroups to 
address specific HUMINT collection problens. 


b. As the primary means for providing national 
level guidance, and as the basis for monitoring human 
source collection, we will complete about 65 National 
HUMINT Collection Plans (heretofore known as National 
HUMINT Tasking Plans). The plans--as now-~will consist 
of statements of prioritized collection objectives 
developed in coordination with policymakers and analysts 
and will provide for a rational assignment of primary 
and supporting collection responsibilities worked out 
jointly with the Intelligence Community and other USG 
agencies with collection potential. The Collection 
Capabilities Group will be redesignated the Collection 
Coordinating Group to more accurately reflect the reality. 
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Cis The principal means for monitoring human source 
collection performance will be assessment (or evaluation) 
of our performance in satisfying the defined collection 
objectives. These assessments will be jointly conducted 
by my staff together with the evaluation or assessment 
components of the Community elements. n this connection, 
Iowant to express my appreciation for the cooperation already 
demonstrated by the principal Community members in getting 
this process started. At the same time, I again urge agencies 
which do not yet have their own evaluation or assessment 
capabilities to establish these soonest. As the process 
takes hold, we will progressively meld the FOCUS reviews 
of individual diplomatic mission reporting into the jointly 
conducted overall assessment of collection performance against 
a given topical or geographic target. We will also simplify 
the FOCUS process. From time-to-time and on a very selective 
basis we will--again jointly with the appropriate Community 
members-~-conduct special evaluations of human source collection 
activities or programs. Here too, I solicit your recommenda- 
tions on areas where such studies would be helpful. 


d. Finally, we will continue to make a strong effort 
in the open-source area for the purpose of: (1) making certain 
that openly available sources are fully exploited, thus 
obviating the need for more complex and costly collection 
activities; (2) providing for maximum knowledge throughout 
the Community of open sources available to analysts and 
collectors; (3) eliminating undesirable redundancy; and 
(4) orienting open-source collection activities toward 
important national intelligence collection objectives. 


"2s The DDCI has reviewed this memorandum and approves the 
approach outlined above. Also, he has asked me to convey to you 
once again his strong interest in improving our human source 
collection capabilities. He is convinced that human source 
collection, with its variety and location both within and outside 
the Community, has suffered badly from personnel and funding 
degradation. He very much favors a concerted effort to rebuild 
our human source collection capabilities and has asked me to be 
particularly supportive of all soundly~based actions or proposals 
toward this end. 
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DCI/RM-80-1951 
15 September 1980 


MEMORANDUM FOR: Director, HUMINT Tasking Office 


FRO: 
Program Assessment Office 


VIA: Director, Program Assessment Office qr 


SUBJECT: DoD's Intelligence Report Evaluation Program--A 
Statistical Review 


REFERENCES: A. NFIP and Resource Guidance FY 82-86 (9 May 80, 
DC 1/RM/3275-80 ) 


B. Amy Clandestine HUMINT--A Review (5 Mar 80, 
DCI/RM 80-2001, Attachment II) 


C. DIA Response to CTS/HTO Questions (31 July 80) 


The DoD Intelligence Report (IR) evaluation program was developed to 
reflect the degree to which DoD Human Source reporting meets the 
requirements levied upon it. The program calls for roughly 20% of all IRs 
to be evaluated. Some IRs are automatically evaluated due: to the 
collection requirements that drive them; others are evaluated at the 
collector's initiative, while still others are evaluated at the initiative 
of DoD analysts. 


DoD analysts prov ide an IR evaluation by subjectively categorizing 
the value of an IR as with "high", "moderate", “low", "none" or “cannot 
judge". 


This statistical review evaluates the soundness of DoD's IR 
evaluation program. (S) 


Background: 


Statistically, samples are selected from a larger population 
according to some rule or plan. Generally, samples are obtained by one of 
two methods; those selected by some form of subjective judgment, and those 
selected according to some chance mechanism (such as randon sampling). 
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A good sample is one from which generalizations to the population can 
be accurately and precisely made. To generalize fran a sample to a 
population, the laws of mathematical probability must apply--random 
sampling assures that these laws do apply. For this reason random samples 
are preferred over judgment samples. 


To generalize accurately and precisely from a sample to a population, 
the uncertainties in the sample must be understood. There are two 
components of sample uncertainty: reliability (numerical precision) and 
validity (accuracy or realism). Reliability is controlled for the most 
part by sample size, and can be calculated from the data at hand. 
Validity, however, cannot be judged from the data and can be controlled 
only before sampling through sound experimental design. (U) 


Discussion: 


DoD's sample size of roughly 20% provides for sufficiently precise 
estimates, JF THE SAMPLE IS VALIDLY CHOSEN. The percentage of IRs rated 
as having high value, for example, are precise to better than +3% (95% 
Confidence Internal) based on the 20% sampling (See Appendix). In fact, a 
sample as small as 500 evaluations, if chosen properly, will provide 
precision to better than +5% (95% Confidence Internal). 


It must be noted parenthetically that the precision of sample 
estimates is proportional to the number of IRs sampled and not to the 
percentage of IRs sampled. Reference C states that samples were taken 


from each of some 120 individual collection entities. Care must be taken 


when examining separately each of these collection entities since their 
sample sizes may be quite small. On the average, one would expect the 
precision of estimates within a collection entity to be on the order of 
+10-20% (95% Confidence Interval). 


However, it is not insufficent reliability but insufficient validity 
that undermines DoD's evaluation program. There are three primary causes 
of invalidity: 


(1) Systematic errors. According to Reference C, there is a 
tendency to initiate evaluations of high-or-low-value reports at 
the expense of reports rated moderate in value. This practice 
results in the systematic elimination of a portion of the 
population and a consequent bias to inferences made from the 
sample. Reference B surfaces another source of systematic 
error: the inordinate number of high evaluations that upon — 
closer examination appear to have been unwarranted. The effects 
of such systematic overrating cannot be removed through 
statistical analysis and thus further undermine the validity of 
the inferences drawn from the sample. | 
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Mismatch between sample and population. Reference B also 
Tsolates a serious mismatch between the sample and the 
population it purports to represent--the sample was taken 
primarily fron the population of mid-level DoD analysts while 
inferences are drawn about the population of consumers 
(policymakers and senior analysts both inside and outside DoD). 
Since the value of a report to mid level analyst appears to be 
different fron the value of the same report to other consumers 
(Reference B), one must seriously question the use to which 
DoD’s summaries can be put. 


Furthermore, DoD's evaluation sample does not appear to match 
the total IR population in several other respects. The sample 
was not randomly chosen (i.e., each report did not have an equal 
chance of being evaluated), thus invalidating the mathematical 
basis for making inferences. As noted before, judgment samp! ing 
is not randan, and according to Reference C, "analyst 
initiative" evaluations are often intentionally biased to 
"reduce the ... IRS which ... are evaluated as being of low or 
no value." Likewise, it is not clear that special and 
initiative evaluations are representative of the total IR 
population, since they represent reports of some special, not 
randon, interest. 


Failure to attend to the representativeness of the sample can 
lead to serious underestimates of uncertainty and consequent 
overoptimism about the stability and realism of population 
inferences. And estimates for which the accuracy is unknown can 
be quite misleading. 


Correlated evaluations. If one analyst evaluates a 
isappropriate share of reports and has a tendency to rate 

reports higher or lower than other analysts, his evaluation may 

speciously inflate (or deflate) the estimated worth of IRs. His 


- evaluations are said to be correlated, and correlated 


evaluations lower the validity of an analysis. Likewise, if 
several evaluations are performed on a single requirement (or 
similar requirements), there is again the tendency for such 
correlated evaluations to artificially alter population 
estimates. There is potential for such correlated evaluations 
in “analyst initiative" reporting. (S) 
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Conclusions: 


ie) 


If the intent is to understand the value to consumers of IRs as a 
whole, mandatory evaluation must be randomly assigned to 10% or so 
(depending upon the accuracy desired) of all reporting to match 
sample to population and to provide for sufficient reliability. 
Furthermore, since mid-level analysts provided evaluations from their 
own perspective, results will be valid only for these analysts. 
Inferences about other consumers are invalid unless it can be shown 
that the attitudes and perspective of mid-level analysts are like 
those of the other consumers. 


"Initiative" and specially requested evaluations, while they may be 
useful for other purposes, should not be included in the data 
analysis due to their systematic biases and potential for correlated 
eval uations. 


The assertion in Reference B that the Intelligence Community "cannot 
rely upon such evaluations for an objective view of the worth of the 
reporting" appears to be based on an invalidating mismatch between 
sample and population. 


The violation of such fundamental laws of validity renders the DoD 


Evaluation of questionable value for estimating the worth of 
intelligence reporting to consumers. (S) 
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APPENDIX. Statitical Foundation for Estimates of Precision. 


DoD defines the value of an IR as either "high", "moderate", "low’, 
“none” or “cannot judge". These categories form a well-defined 
statistical population known as a multinomial population. When samples 
are randomly placed into multinomial categories, the percentage of the 
total sample falling in each category can easily be calculated. The 
variance (a measure of precision) of each percentage, P, is defined as: 
Variance = [P(100-P)] + N, where N is the total sample size. For example, 
if 70% of 2,000 evaluations are rated as "moderate" in value, the 
precision of this 70% is given by: [70(30)] + 2000 = 1.05. A 95% 
Confidence Internal is approximated by twice the square root of this 
number, or about 2. Therefore, the 70% is precise to within +2% (at a 95% 
level of confidence). In other words, if this evaluation were repeated 
100 times, one would expect the proportion of "moderate" ratings to be 
between 68% and 72% 95 times, and outside that range only 5 times. ‘(S) 
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